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Abstract 

Background: Traditionally, heritability and other genetic parameters are estimated from between-family variation. 
With the advent of dense genotyping, it is now possible to compute the proportion of the genome that is shared 
by pairs of sibs and thus undertake the estimation within families, thereby avoiding environmental covariances of 
family members. Formulae for the sampling variance of estimates have been derived previously for families with 
two sibs, which are relevant for humans, but sampling errors are large. In livestock and plants much larger families 
can be obtained, and simulation has shown sampling variances are then much smaller. 

Methods: Based on the assumptions that realised relationship of sibs can be obtained from genomic data and that 
data are analyzed by restricted maximum likelihood, formulae were derived for the sampling variance of the 
estimates of genetic variance for arbitrary family sizes. The analysis used statistical differentiation, assuming the 
variance of relationships is small. 

Results: The variance of the estimate of the additive genetic variance was approximately proportional to 1/ {fn 2 o^, 
for f families of size n and variance of relationships o\. 

Conclusions: Because the standard error of the estimate of heritability decreased in proportion to family size, the 
use of within-family information becomes increasingly efficient as the family size increases. There are however, 
limitations, such as near complete confounding of additive and dominance variances in full sib families. 



Background 

Quantitative genetic parameters such as heritability have 
traditionally been estimated from the variation among 
full- or half-sib families, or from the parent-offspring co- 
variance [1,2]. The covariance among sibs is assumed to be 
proportional to the pedigree relationship, but relatives may 
be further correlated because they share a common envir- 
onment This problem arises particularly in humans and, 
although sire families can be used in livestock to minimise 
the environmental covariance of sibs, these and weaker 
relationships come at the cost of higher sampling errors of 
heritability estimates because the correlation between sibs 
has to be multiplied by the inverse of the relationship to 
obtain an estimate of heritability. Estimates of heritability 
from non-pedigreed populations also rely heavily on getting 
good estimates of pedigree relationship [3], which is diffi- 
cult unless relationships are very close, and environmental 
confounding can still a source of bias. 
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Although pairs of full-sibs, for example, share half 
their genome on average, individual pairs do not because 
of Mendelian sampling of large chromosome segments. 
Such a discrepancy at pairs of loci is the basis of QTL 
(quantitative trait locus) mapping using, for example, 
the method of Haseman and Elston [4], to associate 
the phenotypic divergence between sibs to differences 
in marker frequency. Dense marker genomes are now 
available, and Visscher et al. [5] proposed that the actual 
or realised relationships between sibs can be estimated 
from genomic data and the association between the actual 
relationship and phenotypic similarity used to estimate 
the genetic covariance within families, thereby eliminating 
correlations due to shared environment. Visscher and col- 
leagues used data on human dizygotic twins and full-sibs, 
first from microsatellites [5] and subsequently from SNPs 
(single nucleotide polymorphisms) [6] to estimate the level 
of genome sharing and thus trait heritability. In a later 
paper, Visscher [7] discussed the theory further. However, 
the sampling error of the estimates of genetic variance 
was high because the variation in actual relationship was 
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small (typical standard deviation (SD) of 3.9% of the mean 
of 50% for human full-sibs, as expected from theory 
[5,7-10]). Since family sizes in humans are also very small, 
many are needed for precise estimation. 

0degard and Meuwissen [11] pointed out that the 
method of Visscher et al. [5] could be used in very large 
families, such as for fish species, and for which it is not 
always practical to avoid rearing full-sibs together. They 
showed by simulation that sampling errors of the resulting 
estimates of heritability are substantially reduced as family 
size increases and are smaller with a few large families 
than with many small families. These results raise the 
following basic question: for a family of n sibs, is the 
information content, i.e. the inverse of the sampling 
variance of the estimate of heritability, approximately 
proportional to family size n (or e.g. to n -1) or to the num- 
ber of pairs in the family, %n(n -1)? The simulation results 
of 0degard and Meuwissen [11] indicated the latter. Fur- 
thermore, PM Visscher (personal communication) showed 
that, using genomic relationships estimated from a sample 
of N individuals from the population, the sampling variance 
is a function of A/ 2 . The difference between methods with 
sampling variances that depend on approximately squares 
of numbers rather than numbers of individuals is not trivial 
and clearly has an important impact on their design and 
potential utility. 

The model used by 0degard and Meuwissen [11] was 
based on a finite number (80) of genomic blocks that were 
individually marked, and with trait effects that were identi- 
cally normally distributed for each block. In this note, we 
quantify these estimates and show how they depend on 
the design and variation in realised relationships. We 
adopt a model in which the realised relationship is 
continuous over the genome and with trait effects that 
are uniformly distributed across the genome. To calculate 
sampling errors, Visscher et al. [5] used regression of 
the squared phenotypic difference of sibs on the esti- 
mated actual relationship from tracking genome segments, 
whereas 0degard and Meuwissen [11] used a REML 
(restricted maximum likelihood) analysis within and 
between families with estimated realised relationships for 
a finite number of genome segments. In the present 
analysis the data were assumed to be analysed by REML. 
Implications for design of experiments are discussed. 



Analysis 

Let us assume that the data are from matings of unre- 
lated individuals and comprise f (> 1) families each of 
size n (> 2). The extension to variable n is straightforward 
and deferred meanwhile. The mean (i.e. pedigree) numer- 
ator relationship within families is A (e.g. 0.25 for half-sibs 
or 0.5 for full-sibs) and the within-family variance of actual 
relationships is <x|. We also assume that all sibs share the 



same environment and, for simplicity, as in the work of 
Visscher et al. [5,6], that additive genetic variance is es- 
timated using only within-family differences; in essence, 
family effects are regarded as fixed. Therefore information 
is accumulated independently across families and no bias 
or sampling error arises due to common environment, 
albeit at the cost of losing potential between-family genetic 
information. 

Additive model 

Initially, we assumed that gene effects were additive but 
subsequently extended the results to include dominance. 
The additive genetic variance is o\, the residual environ- 
mental variance is o\ , and so the within-family variance 
is o\j = (1 - A)o\ + o\. The phenotypic variance is given 
by o\ = Ao\ + o 2 c + o\, where o\ is the variance due to 
common environment. In the analysis, it is convenient 
to parameterise the actual relationship between family 
members i and / in terms of deviations from mean 
pedigree-based relationships: = Ay - A. The nx n co- 
variance matrix V of observations y within a family of n 
sibs is then var(y) = V = \o\ + Ra\ , where I is the 
identity matrix and elements of R are i * 7, and r u = 0. 

The sampling variance of the parameter estimates can 
be approximated by using a Taylor series expansion in r t j 
because these deviations are small, and then taking ex- 
pectations so as to obtain Fisher s information matrix S 
(the inverse of the variance covariance matrix) for the 
REML estimates of variance components g\ and g\, 
respectively. The derivation is rather complicated, so 
details are given in Appendix 1. For a family of size n it 
is shown that: 



n-l 



-2mo\o\/ 'a 
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where m = n(l - 21 n + 2/n ). Since between-family relation- 
ships are not used, information from family k is merely 
summed over families, with corresponding elements for 
family size n k and k= 1, ... ,/ The overall variance- 
covariance matrix of the estimates is: 
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The estimate of the environmental variance is o\ = 
o\ and hence var(a^) = C22-C12 +|cn and cov 
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(^L^e) = c n-\c\\-> where cy are elements of C Taking 
just o\ and o\ into account, a\ = d\ + g\ , and the 
sampling error of the corresponding heritability esti- 
mate, h 2 = g\/g\> can ^ e approximated using standard 
formulae for ratios (see e.g. page 818 in [2]). Between- 
family information, not included in the data used above, 
has to be incorporated to estimate the phenotypic variance 
and heritability if common family environment or allow- 
ance for non-additive effects is to be included. 

If the quantity mo\o\lo%j is small, the determinant 
of S is dominated by its diagonal elements and var (5^) 
simplifies to: 

var (*i)«l/s u = 2<4/ \f(n-l)ma 2 R ] (3) 

Hence for families of n = 2 individuals, m = 1 and var 
(<xjT) «2cr^/ (/Vx!) . This corresponds to the formula of 
Visscher et al. [5] for the sampling error of the herit- 
ability estimate: 2{l-t) 2 / (/cr|) , where t is the intra- 
class correlation of family members. As n increases, 
m(n - 1) = n(n - 3 + 2/n - 21 n 2 ) n(n - 3) n 2 . If o\ 
is small and n large, then var(<7^) ~2<r^/ (fn 2 aj>). 

The variation in relationships within a family depends 
on whether family members are full- or half-sibs, on the 
total map length (L) of the chromosomes and, to a limited 
extent, on their individual lengths [5,7,10]. To a good ap- 
proximation, o*| ~ 1/(16 L) - 1/(3 L 2 ) for full-sibs and one- 
half of that for half-sibs [5,7]. For humans, the number of 
autosomes is 22 and the total map length is 35.9 M, so <x| 
is approximately 0.00153 for full-sibs and 0.00077 for half- 
sibs (SD = 0.039 and 0.028). Therefore, for full-sib families 
of a species with a map length and chromosome number 
similar to humans, SE(cr^)~36 o 2 w j [^/Jh(n-3) t e.g. 0.28 
o\ for 50 families of size 20 and 0.17 a 2 ^ for 20 families of 
size 50. Cattle, for example, have 29 autosomes and a map 
length of 32.5 M [12], so would be a little larger and 
the sampling variance of estimates of heritability corres- 
pondingly smaller. 

Simulation check on approximations 

In the analysis in Appendix 1, many simplifying assump- 
tions were made in the Taylor series analysis. As a partial 
check, simulation was undertaken for a model of 22 chro- 
mosomes, each 1.632 M long, i.e. the mean length of hu- 
man chromosomes, and relationships were simulated with 
the programme used previously to check formulae for vari- 
ance in relationships [10]. (The distribution of relationships 
would be little affected if map lengths varied [10]). The 
information matrix S was then computed directly from 
equation (Al) and from the approximation in Equation 
(1). For simplicity, however, it was assumed that the 
contrast matrix K (see below equation (Al)) was invariant 



(see examples in Table 1). In general, there was good agree- 
ment between the observed and the approximate predicted 
estimates of sampling variance (Table 1), but this deterio- 
rated as family size increased, with the approximation 
generally underestimating the sampling variance. This bias 
would be greater if o\ were higher. Although, if only a 
single chromosome was fitted cr| would be much greater, 
the additive variance contributed by it would be only a 
fraction of the total and, as the example in Table 1 shows, 
the approximation remains good. Table 1 also gives pre- 
dictions based solely on Equation (3), showing a good fit 
with those obtained directly from Equation (2). 

Dominance 

In full-sib families, both additive and dominance vari- 
ance can, in principle, be estimated. Derivation of the 
extended information matrix is given in Appendix 2. It 
depends on the variance Oq in dominance relationships 
(about its mean of %) and the covariance between domin- 
ance and additive relationships, cov RQ . However, as Visscher 
et al. [5] pointed out, the additive and dominance relation- 
ships within families are very highly correlated, since the 
additive coefficient depends on the average number of 
paternal and maternal genes that are shared identical by 
descent at a locus and the dominance coefficient on 
whether both are shared. The regression of dominance 
on additive relationships (cov R q I g\) is equal to 1 and 
their correlation is approximately 0.9. This implies that, 
in practice, partitioning o\ and o\ using within-family 
information is probably not feasible and furthermore 
that if only an additive model is used, the estimate of a\ 
is biased upwards by g\\ indeed it essentially has expect- 
ation o\ + o\. 

Discussion and conclusions 

The analysis shows that the sampling variances of estimates 
of heritability based on within-family realized relationships 
fall roughly in proportion to n 2 as family size n increases, 
i.e. based on the number of pairwise comparisons among 
individuals in the family, and in proportion to the number 
of families. Therefore, when undertaking such an ana- 
lysis, it is more efficient to use few very large families, 
although one might be reluctant to use just one or very 
few families in case they are atypical [11]. Here, a model 
of a continuous genome was used, rather than a finite 
number of independent regions as by 0degard and 
Meuwissen [11], and the calculations assumed a fairly 
even distribution of genetic variance along the genome. 
If there is much heterogeneity, e.g. a few QTL of large 
effect, the sampling errors of genetic variance estimates 
would increase. In the present analysis, we make the as- 
sumption that shared segments are identified accurately, 
for example using Merlin [13]. 
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Table 1 Comparison of var(a^) predicted from the information matrix directly and from the Taylor series approximation'' 



Family 


HS 






FS 






FS 






FS 






FS 






h 2 


0.5 






0.25 






0.5 






0.75 






0.04 






chr 


22 






22 






22 






22 






1 






n 


5 


15 


25 


5 


15 


25 


5 


15 


25 


5 


15 


25 


5 


15 


25 


Eq (A1) 


174 


12.2 


4.15 


94 


6.67 


2.26 


82.6 


5.94 


2.04 


71.4 


5.20 


1.81 


4.80 


0.354 


0.127 


Eq (1) 


182 


11.8 


3.88 


101 


6.56 


2.18 


88.6 


5.83 


1.97 


77.1 


5.23 


1.82 


5.26 


0.331 


0.110 


Eq (3) 


182 


11.7 


3.80 


101 


6.53 


2.16 


88.1 


5.69 


1.88 


76.0 


4.90 


1.62 


5.26 


0.330 


0.110 



*Predictions were obtained directly by inverting the realised information matrix (eq A1) obtained from sampling relationships, and from the Taylor series 
approximation eq. (1) using the variance of relationships directly; variances were computed by averaging information over samples of 100 families, but are 
expressed for a single family, so for f families var(d^) should be divided by f; predictions using the simplification eq. (3) are shown similarly; results are for half 
(HS) and full (FS) sib families; h 2 is the proportion of variance contributed by the fitted chromosomes; chr is the number of chromosomes; chr = 22 denotes the 
whole genome; chr= 1 denotes a single chromosome. 



0degard and Meuwissen [11] investigated the effect of 
selectively genotyping only the individuals with high and 
low phenotypes within a family, when all phenotypes are 
included in the REML analysis. The efficiency of this ap- 
proach was good in terms of sampling errors but estimates 
of heritability were biased downwards when sample sizes 
were small This may reflect insufficient marker coverage 
of the genes of interest because of lack of linkage disequi- 
librium, in which case this bias may be hard to avoid, but 
possibly also bias caused by selection. 

They also estimated actual relationships from a finite 
number of markers and, occasionally, obtained a singular 
matrix in their simulated replicates [11]. To check the 
causes, simulated relationships were sampled from a 
continuous chromosome model [10] and the exact al- 
lele sharing was computed. Pairs of individuals can 
inherit identical non-recombinant short chromosomes, 
thereby yielding a positive semi-definite relationship matrix 
(i.e. including zero but not negative eigenvalues). In the 
unlikely event that this occurs at all chromosomes, the data 
can still be analysed by REML. Negative eigenvalues were 
not obtained in our simulations and indeed seem infeasible, 
because the relationships were jointly sampled. Negative 
eigenvalues are a consequence of the estimation of weak 
relationships from marker data and might arise in practice. 

A different approach to estimating the genetic variance 
free of common environment was suggested by Yang et al. 
[14]. They fitted by regression all the SNPs to data from 
individuals sampled from the population that are not 
known to be related and from which any pairs with a rela- 
tionship above a low threshold have been removed, so as 
to minimise the chance of shared environment. Such an 
analysis is expected to give a lower estimate of heritability 
than the within-family analysis discussed here, however, 
because marker-associated effects in the population can be 
missed through incomplete linkage disequilibrium, espe- 
cially when traits genes have low minor allele frequencies, 
as indeed seems to be the case [14]. 

A 'back of the envelope' calculation allows a simple 
comparison of the sampling errors of estimates of additive 



genetic variance from within families utilising variation 
in relationship, &\ w , and from between families using 
ANOVA, a\ h (Appendix 3). Provided the families are 
not small, v^(a 2 Aw ) /var(a 2 Ab )^(A 2 /a 2 R ) / [l + nAa 2 A /a 2 w }\ 
With use of half-sib families (A = 1/4) to eliminate maternal 
effects in the between-family estimate, for a genome of 
'human length, {A 2 /a 2 ) = (0.25/0.028) 2 ~ 80. Assuming 
the heritability is 1/3, such that Ao\ = \g\j , the ratio of 
variances is approximately 80/(1 + n/5) 2 , equalling 1.0 when 
n ~ 40. This implies that, with half-sib families of size 40, a 
similar amount of information would be obtained from 
within- and between-family data. With fewer larger fam- 
ilies, the estimate from within-family information would 
have the lower standard error. Furthermore, because the 
within- and between-family estimates use the data in a dif- 
ferent way they are, presumably, uncorrelated and so they 
can be simply combined. However, estimates from both 
sources may be biased to different extents by common en- 
vironment, dominance, epistasis, etc., so specific applica- 
tions require specific consideration. 

There are other aspects that could be examined. For 
example, additive and within-family genetic covariances 
and correlations among traits can be estimated from a 
multi-trait analysis with the same data structure. Clearly 
the magnitude of their sampling errors is structured simi- 
larly to those of the corresponding variances of the individ- 
ual traits. Estimation of variation due to any individual 
autosome can be achieved by fitting just the relationship on 
this chromosome, and similarly for the sex chromosome 
[6]. The variance of the corresponding relationships is then 
much higher and depends on the length of the chromo- 
some, decreasing roughly in proportion to its length. Al- 
though var(cr^) per chromosome is then much smaller, 
the coefficient of variation of its estimate may be similar to 
that for the whole genome under the simplest assumption 
that the contribution by any chromosome to o\ is roughly 
proportional to its length. 

A problem specific to the within-family approach is 
the high degree of confounding between additive and 
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dominance effects in full-sib families (albeit there is also 
complete confounding in a between full-sib family analysis). 
This is not resolved by estimating o\ separately from ma- 
ternal and paternal sharing, since the dominance coefficient 
is the correlated intersection of these. The point is that, 
while maternal genomic similarity appears to include only 
the additive component because only one sire is involved, 
interactions between sire and dam effects, i.e. dominance, 
are included. Half-sib families with multiple dams per sire 
or a cross classified structure are needed, similar to when 
between-family correlations are used for estimation. 

If, for example, a number of males and females are put 
together for mating in a single environment, then the 
pedigree can be obtained from genetic markers. Hence, 
paternal half-sibs, maternal half-sibs and full-sibs can be 
distinguished and the between-family covariance can be 
used. Additional information from within-family segrega- 
tion could be identified via the markers, but this would 
likely contribute little. For example, in a pen comprising 
such a diallel structure, the variation in pedigree relation- 
ships (A = 0, % or V2) is likely to be much larger than the 
variation in realised relationships among pairs with the 
same pedigree relationship. 

Epistatic variance provides other associated difficulties of 
potential confounding and estimation. On a whole-genome 
basis, the relevant coefficient for the additive x additive 
variance component is the square of the relationship, which 
is highly correlated with the additive coefficient. Thus, simi- 
lar to the analyses between families, obtaining a satisfactory 
partition between additive and additive x additive or higher 
order components is probably not feasible. A further prob- 
lem is potential bias due to epistatic effects in the estima- 
tion of additive (e.g. from additive x additive effects) and 
dominance variance. Although the expected probability 
that sibs share alleles at pairs of genomic sites is small for 
the genome as a whole, it is much higher for nearby sites. 
Thus, if epistatic effects are substantial and predominately 
cis-acting, this bias could be important. To partially ad- 
dress this, Visscher et al. [6] fitted the mean relationship 
for each chromosome in a multiple regression model for 
human height. The variance removed by fitting variation 
in relationships for each chromosome was essentially the 
same whether chromosomes were fitted independently or 
in a joint analysis, indicating little or no interaction be- 
tween regions on different chromosomes. Extending this 
more generally needs genomic regions to be defined such 
that joint identity by descent can be computed. 

Within-family analysis, particularly when families are 
large, has attractive features because, it avoids bias due to 
common environment effects, but it introduces other po- 
tential confounding effects, as noted above. It also requires 
much genotyping and associated costs. Although in a 
breeding context this type of information may be available 
when collecting data to implement genomic prediction and 



subsequent selection, estimates of the variance components 
may not in themselves have value beyond what is obtained 
from the marker trait associations. But this is something to 
think about. 

Appendix 1: Derivation of the sampling variance 
for the additive model 

For the REML analysis, the information matrix S, which 
in turn yields the sampling variances based on S" 1 for 
the estimates of a\ and for each family, is defined by 
Lynch and Walsh (see page 791 in [2]): 



1 /tr(PRPR) tr(PRP\ 

2 ^ tr(PRP) tr(PP) J ' 



(Al) 



where tr denotes the trace operator. Matrix P = K , (KVK , ) _ 
K and K( w _ 1) x « defines contrasts such that KX = 0, where 
X is the design matrix and, since family members are con- 
temporaneous in the same environment, X is a unit vector. 
The Helmert contrasts are suitable for K: for i=l, 
n - 1: k^m+l)]- 112 , j<i; k, + x = -[(//(/ + 1)] 1/2 
and kij = 0, / > i + 1. Note that KK' = I( w _ i) x ( w _ 1) and 
KTC = l n x n - \ J wx m where all elements of J equal 1, 
and (K'K) 2 = Kl(. 

The expected information using the Taylor series ex- 
pansion has terms of the following form: 

E(PRPR) = PRPR| R=0 + ^3(PRPR)/3^| R=0 E( n/ ) 

+ XE 32 ( PRPR )/ 3r ^/|R=oE(^r /c/ ) + ... 

i<j k<l 

We note that ~E(r i; ) = 0 and, assuming independent 
Mendelian segregation to each offspring, E(r^/) = 0, i* k 
and/or j * I and E(r // ) 2 = cr|, where a\ is the variance in 
relationship. Differentiating 

a(PRPR) ap _ dR ap „ 

} - = - — RPR + P.— PR + PR,— R 

rir- r)r- r)r- rir- 



+ ««» 



(A2) 



and when evaluated as R — > 0, all terms in (A2) become 
zero. Furthermore, differentiating (A2) to obtain the 
second derivative, all remaining terms in R are also 
zero; and as R is linear in r^, d 2 R/drijdr k i = 0. Finally, as 
= 0 unless i = k and / = /, E(PRPR) reduces to 



E(PRPR)-- > P — P— + — P — P cr 

1 ; dry dnj dnj dnj J 



Let dRIVy = Xip with elements x t j = Xji = 1 and 0 otherwise; 
so taking R — > 0, 

E(PRPR)-i ( PX ^ PX v + X ^ P K ( A 3) 



KJ 
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As R 0, V P = IC(KVK') 1 K (I - £J)/<x^. Defining 
further matrices, Y# where = = 1 and 0 otherwise, 
and Wij where w ik = Wj k = 1, k = 1, n, and 0 otherwise, 
we have X if X iJ = Y ip JX i; = JY /; = W ip W^ = 2W^, 
and tr(X, y ) = 0, tr(Y#) = tr(W, 7 ) = 2. As the trace oper- 
ator is commutative, it follows that by summing over 
the n{n - 1)12 off diagonal elements in (A3), all having 
the same expectation, 

E[tr(PRPR)] 4 W («-l)tr[(I-J/ W )X, / (I-J/ W )X, / )]^/< 
^-n{n-l)tY{Y ir 2Wij/n + 2W, ; -/w 2 )o*/c4 
~n(n-\)(\-2ln + 2ln 2 )o\lo%j = (n-l)ma 2 R /a' w 

(A4) 

where m = w(l - 2M + 2M 2 ). 

We give less detail for other terms in the information 
matrix. 



a(PRP) ap_ nn ap 

RP + P — P + PR, 



3^ 



3r» 



3r# 



3r» 



Non-zero second derivatives must involve differentiation 
once of P and once of R. Hence 

E PRP 4 V 2 — — P + 2P— - — 4 



2 j^\ dr^ dr if drij drijj 



ap 



3r# 3r i; 



av 

K'(KVK') _1 K — K'(KVK') _1 K and, as R^O, 



3P 

3rs 



5(PRP)4 £ -4 (l-ij) X, (l-ij) X, (l-ij) 



As the trace is commutative and I - £J is idempotent, 
putting the last such matrix in (A5) first, we see that: 



E[tr(PRP)]~-2w(w-l)(l-2/w + 2/n 2 )a 2 A a 2 R /a 6 w 



-2{n-\)m o\o\lo%. 



In (A6) = -K / (KVK / )" 1 K^K / (KVK / )" 1 K 



a 2 p 



- 2K / (KVKT 1 K^K / (KVK / )" 1 K^K / (KVK / )" 1 K 

drfj dr tj dr^ 

And hence, using the commutative property, 



a 2 (pp) 



: 6tr (^'(KVK'^K ^- K' (KVK') _1 K |^ K'(KVK')~ ^ 

V dry OVij 

= 6tr((I-lj)X, y (I-lj)X, y (I-lj))4/<. 



Therefore, using previous results, 
E[tr(PP)Kw-l)/c4 + 3wH <x R 4/<4 

thus completing the derivation of the information matrix 
in Equation (1) of the main text. 

Appendix 2: Fitting additive and dominance 
variances 

Let V = Icr^ + Ra^ + Qcr^ of dimension nx n, where, for 
full sib families, o 2 ^ = o\ + \ o\ + fo^. Additive and dom- 
inance effects of the loci are assumed to be uncorrelated. 
Let Q with elements qg define the departure of the realised 
dominance correlation of full sibs from the expected and 
let Oq denote var(^y) and similarly cov RQ denote covfcy, q t j). 
The information matrix is now [2]: 

/ tr(PRPR) tr(PRPQ) tr(PRP \ 
S = M tr(PRPQ) tr(PQPQ) tr(PQP) . 
\ tr(PRP) tr(PQP) tr(PP) / 



The term E[tr(PRPR)] « (n -X)mo\j o\j is unchanged 
from the additive case and, by symmetry, 

E[tr(PQPQ)M«-l)m4/<4 and 
E[tr(PRPQ)]«(«-l)mc0V RQ /<4. 

(A5) The derivative of the term PRP with respect to remains 



a prp ap_ aR^ _ ap 

-^ — ^ = — RP + P — P + PR — , 

orij drij oTij drij 

and the expectation of its second derivative with respect to 
r t j is unchanged. However, now taking the second derivative 
with respect to q ip we obtain additional terms with non zero 
expectation, 



When R = 0, P = (1 - Vn)\o\ and tr(PP) = {n- 
Now considering the terms in r ip 



a 2 (pp) a 2 p ap ap a 2 p 

^ L = p 4- 2 h P 

dr 2 : dr 2 : drudru dr 2 

ij i) tj i) i . 



with additional terms that become 0 as R — > 0. 



(A6) 



a 2 (prp) ap aR aRap 

— — - = p + p . 

drgdq p dq tj dr tj dr i} dq tj 

Hence E[tr(PRP)]^-2(^-l)m(a 2 a 2 + cov m o 2 D ) jo%, 
and similarly 

E[tr(PQP)] - -2{n-\)m(cov^o\ + The 
term E[tr(PP)] is non-zero when differentiated twice 



Hill Genetics Selection Evolution 2013, 45:32 
http://www.gsejournal.Org/content/45/1/32 



Page 7 of 7 



with respect to r t j and to q t j and once each with both 
variables. Hence 

E[tr(PP)M"-l)A4 

+?>(n-l)m(o\o\ + 2cov RQ a 2 A al + a 2 Q a^J / <?w • 

The information matrix for a single family is therefore 



\ symm 



m cov R q -2m (a\a\ + o^cov RQ ) jo\ 
ma 2 Q -2m (cov RQ a 2 A + a 2 Q a^j /a^ 
1 + 3m (o\o\ + 2cov RQ a 2 A al + a 2 Q a^j/a^ j 



These equations apply to estimates of o\ , d\ and g\. 
For full sib families, the estimate of the error variance 
would be ^e = ^w4^a~4^d> an< ^ its sampling error com- 
puted accordingly from S" 1 . 

As noted in the main text, cov R q =cr|, so S simplifies to 



n-l 



y symm 



-2mol(ol + ol)lo\ \ 
- 3m (4 + 2al ) o\ + /<4 y 



However, as cr| and <Tq have similar magnitude, S is al- 
most singular and thus the genotypic variance cannot be 
partitioned into additive and dominance components 
unless the dataset is very large. 

Appendix 3: Comparison of between and within 
family estimators 

Let us assume a balanced one-way ANOVA (which is also 
REML if there are no unbalanced fixed effects) is used to 



estimate o\, 1. 



^2 



Ab 



(MSB - MSW)/(nA) where MSB 
and MSW are the mean squares and A is the pedigree 
relationship (V2 or %). It is assumed that there is no en- 
vironmental correlation among sibs. Hence, with/families 

each of size «, var(M55) = 2[a% r + (n-\)Ao\ ] 2 /(f-l), 
var (MSW) = 2a^/\f(n-l)} and, as these are uncorrelated, 



2o%, f[l + (»-!) {Aa\/al)Y 
(nAf 



/-I 



For the within-family estimates, var(<7^ w ) is given by (3). 
Further simplification requires making some assumptions 
about numbers and size of families. As a first approxima- 
tion, assume neither is small, so 



var 



and 



2o%j\Y + nAc\l(r- 



fn 2 A 2 



varfcr 2 



Aw/ 



?rr 4 



'Aw/ 



'Aby 



[1 + nAal/alY 
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